Genome-wide exploration of sugar transporter (sweet) family proteins in Fabaceae for Sustainable protein and carbon source

Sugar transporter proteins (STPs) are membrane proteins required for sugar transport throughout cellular membranes. They plays an imperative role in sugar transmission across the plant and determinants of crop yield. However, the analysis of these important STPs Sugars Will Eventually be Exported Transporters (SWEET) family in legumes is still not well-documented and remains unclear. Therefore, the in-silico analysis of STPs has been performed to unravel their cellular, molecular, and structural composition in legume species. This study conducted a systematic search for STPs in Cajanus cajan using the Blastp algorithm to understand its molecular basis. Here, we performed a comprehensive analysis of 155 identified SWEET proteins across 12 legumes species, namely ( Cajanus cajan , Glycine max , Vigna radiate , Vigna angularis stress responses. This study will be useful to examine photosynthetic productivity, embryo sugar content, seed quality, and yield enhancement in Fabaceae for a sustainable source of essential amino acids and carbon source.

Introduction Fabaceae, also called Leguminosae, is the third-largest family after the orchid and aster family. Fabaceae is one of the most important plant families in economic and medicinal aspects. It consists of about 700 genera and 20,000 species of trees, shrubs, vines, and herbs. The Fabaceae family is most commonly found in tropical rainforests and dry forests in Asia, Africa, and Latin America [1]. This family includes Glycine max (soybean), Phaseolus (beans), Cicer arietinum (chickpea), Cajanus cajan (pigeonpea), Pisum sativum (pea), and many more [2]. In developing nations major portion of population is relying on legumes as a prime source of protein [3][4][5]. Besides proteins, legumes are abundant in carbohydrates, dietary fibers, and micronutrients such as vitamins, minerals, and fatty acids are [6,7]. Epidemiology studies reveal that the regular intake of legumes can prevent the incidence of various diseases like HDL, cholesterol, metabolic syndrome, and heart disease. Integrating legumes as a part of a low glycemic index diet improves glycemic controls and reduces coronary heart disease. It can also be used for grain, green manure, timber, medicinal and industrial uses, etc [8,9].
Plants grow autotrophically and photosynthesize themselves using carbon dioxide, light, and water. Photosynthesis produces carbon for the growth and maintenance of non-photosynthetic organs, and absorbed carbon is distributed throughout the plant to make sugar the major transportable form of energy [10]. Sugars work as signaling molecules, and plants have evolved ways to sense sugar availability and respond to nutritional status by changing gene expression and protein activity [11]. Sugar production, status, and transportation to different tissues influence plant growth, productivity, and yield [12]. It is accumulated in plants in simple sugars, carbohydrates, and starch. Sugars are then carried from the leaves (source tissue) to the roots, modified leaves, and reproductive tissues (sink tissue), i.e., seeds [13]. Sucrose is synthesized into starch in the cytosol and translocated to other non-photosynthetic tissues for direct metabolic use or conversion. The amount of sucrose available for transportation to sink tissue is key for plant growth and development [14,15]. STPs are required for sucrose to move efficiently across membranes because there are no symplastic linkages between maternal and filial tissues. Transferring sugar from maternal tissues to developing embryos is most likely accomplished via membrane-bound STPs [16]. Transporter proteins carry molecules across the plasma membrane in both active and passive transport modes. SUT (sucrose transport) and SWEET (sugar will eventually be exported transporter) (sucrose effluxer) proteins control or facilitate the transport of sucrose [17,18].
SWEET proteins are a new class of sugar transporters that mediate sugar translocation across cell membranes. These are essential for sugar efflux, phloem loading, plant-pathogen interactions, and reproductive tissue development, producing plant nectar and developing plant seed. The SWEET family of sugar transporters has seven predicted transmembrane domains and two internal triple-helix bundles, resulting from bacterial gene duplication [19].
To enhance crop yields and to feed the growing global population, it is critical to understand how plants modulate carbon absorption and transportation of sugar by discovering their protein structures and functions. They are exploring the available genome, transcriptome, and proteome information of Fabaceae. Here, we describe the protein structures and functions of 155 SWEETs among the 12 Fabaceae family against Cajanus cajan with focus on their amino acid profile, secondary structure, phylogenetic relationship, and motif identification homology modeling of sugar transporter proteins. The findings of the present study may be useful for further structure assessments, probable identification of medication target, gene expression analysis, cloning, and characterization of SWEETs in legumes.

Primary protein sequence detection
The primary sequence of 155 SWEET proteins was analyzed using the ExPasy ProtParam tool (https://web.expasy.org/protparam/) and the BioEdit sequence alignment tool [26]. The analysis comprises the amino acid composition, molecular weight, hydrophobicity, and hydrophilicity. Hydrophobicity and hydrophilicity of proteins were estimated by Kyte and Doolittle scale mean hydrophobic scale and Bokyo scale mean hydrophobicity profile method. A window of defined size was moved along a sequence, the hydropathy scores were summed along with the window, and the average (the sum divided by the window size) was taken for each position in the sequence. The physicochemical properties of each amino acid were analyzed by using R-packages. The physicochemical properties of each amino acid were visualized using ggplot2 in R-package [27].

Multiple sequence alignment and phylogenetic tree
Multiple sequence alignments (MSA) were done on the amino acid sequences of 155 identified SWEET proteins using Clustal Omega (http://www.clustal.org) with default settings [28]. It is based on the mBED algorithm for calculating guide trees for large or small protein sequences. The phylogenetic tree was visualized using MEGA V 6.0 (The Molecular Evolutionary Genetics Analysis) [29]. Evolutionary genetics analysis was performed using maximum likelihood, evolutionary distance, and maximum parsimony methods. A bootstrap analysis with 1000 reiterations was conducted to determine the statistical stability of each node [30].

Motif identification
The conserved regions within these SWEET proteins across different legume crops were identified using MEME tools (meme-suite.org/tools/meme) [31]. MEME performs by searching for repeated, un-gapped sequence patterns in the protein sequences. MEME determines the width and number of occurrences of each motif repeatedly to minimize the 'E-value' of the motif. E-value is the probability of finding an equally well-conserved pattern in random sequences. To confirm the output, all of the results were manually verified.

Secondary structure and homology modeling detection
To standardize the protein structure, we determined the secondary structure of all 155 SWEET proteins using Proteus Structure Prediction Server [32]. A random prediction of secondary structure in three states (helix, strand, loop, and random coil). It is used to determine alpha helix, beta bridge, random coil, beta-turn, extended strand, and ambiguous state of SWEET proteins. The three-dimensional (3D) protein structures were modeled using I-TASSER [33] and MODELLER 9.18. MODELLER uses 'Normal' mode modeling to simulate protein structures that have been experimentally solved. The protein sequences were aligned to create a model with a template structure, atomic coordinates, and a script file [34]. After modeling, the models were evaluated by MODELLER's normalized DOPE (Discrete Optimized Protein Energy) function.

Primary sequence and physicochemical properties analysis
Protein sequences were deduced from the corresponding Cajanus cajan to the 11 Fabaceae families. The amino acids length in SWEET proteins ranged from 171 to 558 amino acids with a corresponding molecular weight between~18.61 KD to~59.71 KD. However, few amino acids were found in Spatholobus suberectus (171 amino acids) and Glycine max (172 amino acids). In contrast, the Vigna radiata (558 amino acids) had the highest amino acids (Fig 1). The average length of amino acid is 256, and the molecular weight is~28.6 KD. The amino acid analysis revealed that 155 SWEETs are abundant in leucine, valine, isoleucine, phenylalanine, serine, and alanine amino acids residues and relatively lower in cysteine, histidine, aspartic acid, glutamine, and tryptophan (Fig 2). The physicochemical analysis using Expasy's ProtParam tool parameters indicates that 72.26% of SWEET proteins are in a stable form. One of the important features of any protein is its isoelectric point (pI) is the pH at which the amino acid is neutral. The lowest pI value was 5.04 found in Glycine max, and the highest pI value was 9.82 in Cajanus Cajan. The pI value above seven indicates the zero net electrical charge at the basic value of pH. Negatively charged residues (Asp + Glu) range from (7-37), and positively charged residues (Arg + Lys) range from (12-49) (Fig 3, S1 Table). It helps in predicting the topology of proteins [35].  The SWEET proteins are highly hydrophobic and thermostable due to profusions of nonpolar amino acids such as leucine, valine, phenylalanine, and alanine, making a protein globular shape (Fig 4). Thus, these properties can contribute to membrane and protein stabilization against various biotic and abiotic stresses in cell development, signal transmission, and osmotic homeostasis in plants [36,37]. The hydrophobic interactions play a key function in organizing and stabilizing the protein structure because these residues are evolutionarily conserved [38].

Secondary and tertiary structure prediction
The secondary structure of 155 SWEET proteins chains was analyzed using Proteus Structure Prediction Server that predicted the alpha helix, extended strand, beta-turn, and random coil (Fig 5). Secondary structure analysis revealed that the proteins are rich in α-helix (35.65%) than random coils (33.47%) and extended strands, respectively (30.87%). The SWEET proteins The plot has an amino acid sequence of SWEET proteins on its X-axis and degree of hydrophobicity on its Y-axis. The hydrophobicity index is a measure of an amino acid's relative hydrophobicity, or how soluble it is in water. Hydrophobic amino acids are more likely to be located in the inner part of a protein, whereas hydrophilic amino acids are more likely to be in touch with the aqueous environment. SWEET proteins are highly hydrophobic and thermostable in nature which plays an important role in structuring and maintaining protein structure.
https://doi.org/10.1371/journal.pone.0268154.g004  (33.47). This suggests that SWEET proteins are composed more of alpha helix chains and random coils than extended strands. The graph shows percentage of sequence on X-axis and density on Y-axis. revealed the predominant nature of α-helix and random coiling, underlining the more compact, strongly bonded, and transmembrane position of the SWEET proteins (S1 Fig). The αhelix of SWEET proteins were ranged from 14% to 63%, random coiling 20% to 47%, and extended strands 10% to 52.63%. The α-helices are amphipathic and have been projected to form a water-accessible, translocation pathway that is alternately accessible to extra-and intracellular sugar [39].
Plants are dependent on controlled sugar uptake for correct organ development and sugar storage, and apoplastic sugar depletion is a defense strategy against microbial infections like rust and mildew. Recently, a crystal structure of the plant symporter STP10 of 2.4 Å structure in Arabidopsis thaliana was determined. The structure explains high-affinity sugar recognition and suggests a proton donor/acceptor pair that links sugar transport to proton translocation. It contains a Lid domain, conserved in all STPs, that locks the mobile transmembrane domains through a disulfide bridge and creates a protected environment which allows efficient coupling of the proton gradient to drive sugar uptake plant STPs generally contains 12 structurally conserved transmembrane domains (a large loop located in the cytoplasm in the middle of the sequence divides the whole protein into two parts; each contains six transmembrane domains) [40,41]. STPs are H+/sugar symporters and transport fructose, glucose, galactose, pentose, xylose, mannose, and ribose [42].

Comparative analysis of SWEET proteins across Fabaceae family
The accessibility of various Legume genomes has provided an excellent opportunity to explore the phylogenetic and evolutionary dynamics of the SWEET protein family in Fabaceae species. There is a good association between phylogenetic analysis and gene function in SWEET proteins, showing that aminoacid based phylogenetic analysis can predict a potential function of the SWEET proteins protein [43,44]. To examine the phylogenetic relationship in 155 SWEET proteins from (Cajanus cajan, Glycine max, Vigna radiate, Vigna angularis, Medicago truncatula, Lupinus angustifolius, Glycine soja, Spatholobus suberectus, Cicer arietinum, Arachis ipaensis, Arachis hypogaea, Arachis duranensis). The number of SWEET proteins derived in different species is thought to be the outcome of genes growth in distinct clades among these species [45]. To better understand their evolutionary ties, an unrooted phylogenetic tree was constructed. Based on phylogentic analysis the SWEET proteins are separated into seven clustered groups: Group I, Group II, Group III, Group IV, Group V, Group VI, and Group VII. Each group consists of a different range of SWEET, respectively. The SWEET subfamily expanded significantly in Group I, Group III, and Group V. This study reveals that Group III showed more SWEET, i.e.,43 of 5 species (Glycine max, Glycine soja, Medicago truncatula, Vigna angularis, and Vigna radiate), Group I comprises 27 SWEET genes of 3 species (Vigna angularis, Cajanus cajan and Glycine max) and Group V is a cluster of Medicago truncatula, Lupinus angustifolius and Cicer arietinum (Fig 6).
However, relationships have been found among 12 species in all groups except Group II and Group VII. Group II and Group VII comprise the least SWEET proteins (Cajanus cajan, Cicer arietinum, Spatholobus suberectus, and Lupinus angustifolius), i.e., 10 and 11. It indicates that SWEET protein has a significant degree of amino acid sequence similarity among Glycine max, Glycine soja, and Vigna angularis. The proportion of amino acid sequence similarity indicates that the Glycine max, Glycine soja, and Vigna angularis sequences are related, consisting of a greater SWEET and share a common ancestor. There is evidence for an evolutionary link based on these findings. Previously reported that sugar transporter genes exhibit divergent evolutionary patterns in monocots and eudicots. A eudicots sugar transporter genes have higher frequencies of recent duplication than monocots [46].

Identification of conserved residue (Motif)
The motif discovery algorithm looks for similar short sequences (the needle) in a set of much longer sequences (the haystack). We set a parameter for motif identification, i.e., ten conserved regions are identified using MEME. The amino acid sequences are represented by different colors (Fig 7).
In all 10 motifs, we found five motifs are more conserved across 155 SWEET proteins. The result depicts motif 1 (FGLFLSPVPTFYRIIKKKSTEEFSSJPYIATLLNCLLWTWYG), motif 2 (VFNISMYASPLSIMKLVIKTKSVEFMPFFLSL), motif 3 RDIFVAVPNGIGTLLGJJQLI-LYAIYRNK, and motif 4 (LLVVTINGFGIVIEIIYLLIFLIYAPKKGRVKTLK) are the most common motifs (Fig 6). SWEET motifs are abundant in essential amino acids leucine, valine, isoleucine phenylalanine, and serine residues and lacked in semi-essential amino acids such as cysteine and histidine and one essential amino acid like tryptophan (Figs 7 and 8).

Conclusions
The availability of various Fabaceae crops genomes has provided tremendous opportunity to explore the structural and functional dynamics of the STPs protein family in Fabaceae. We   11 from Glycine max, 10 Arachis ipaensis, and Cicer arietinum, 9 from Arachis duranensis. Although the STPs and SWEET proteins have been well studied in several plants, their function in Fabaceae is still unclear due to lack of gold standard genome sequencing data. We found that most of the SWEET proteins had similar conserved motifs, rich in non-polar amino acids, while variation in protein structure was found in 155 SWEET proteins. Our analysis shows that the majority of SWEET proteins are in the stable phase, with the exception of 3 unstable SWEET proteins. The average instability index was 36.5. Based on the physicochemical analysis, SWEET proteins are 7-fold higher than negatively charged residues (Asp + Glu). These proteins are profuse in α-helix followed by random coiling. This study can help us understand the evolution of the SWEET protein family in Fabaceae. SWEET proteins are rich in essential amino acids, such as leucine, valine, isoleucine, phenylalanine, and serine which play vital roles in plants sugar transport, growth, development, and survival. It has a significant potential to enhance plant performance, especially crop yield, phloem loading of sucrose, reproductive organ development, seed filling, and senescence. Therefore, understanding the function and regulation of sugar transporters and their metabolic enzymes in legumes will help mitigate global food security and malnutrition problems because legumes are a greater source of essential amino acids.